average improvement
Cross-lingual Retrieval for Iterative Self-Supervised Training
Recent studies have demonstrated the cross-lingual alignment ability of multilingual pretrained language models. In this work, we found that the cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs. We utilized these findings to develop a new approach --- cross-lingual retrieval for iterative self-supervised training (CRISS), where mining and training processes are applied iteratively, improving cross-lingual alignment and translation ability at the same time. Using this method, we achieved state-of-the-art unsupervised machine translation results on 9 language directions with an average improvement of 2.4 BLEU, and on the Tatoeba sentence retrieval task in the XTREME benchmark on 16 languages with an average improvement of 21.5% in absolute accuracy. Furthermore, CRISS also brings an additional 1.8 BLEU improvement on average compared to mBART, when finetuned on supervised machine translation downstream tasks.
Skillful joint probabilistic weather forecasting from marginals
Alet, Ferran, Price, Ilan, El-Kadi, Andrew, Masters, Dominic, Markou, Stratis, Andersson, Tom R., Stott, Jacklynn, Lam, Remi, Willson, Matthew, Sanchez-Gonzalez, Alvaro, Battaglia, Peter
Machine learning (ML)-based weather models have rapidly risen to prominence due to their greater accuracy and speed than traditional forecasts based on numerical weather prediction (NWP), recently outperforming traditional ensembles in global probabilistic weather forecasting. This paper presents FGN, a simple, scalable and flexible modeling approach which significantly outperforms the current state-of-the-art models. FGN generates ensembles via learned model-perturbations with an ensemble of appropriately constrained models. It is trained directly to minimize the continuous rank probability score (CRPS) of per-location forecasts. It produces state-of-the-art ensemble forecasts as measured by a range of deterministic and probabilistic metrics, makes skillful ensemble tropical cyclone track predictions, and captures joint spatial structure despite being trained only on marginals.
Enhancing Disinformation Detection with Explainable AI and Named Entity Replacement
González-Silot, Santiago, Montoro-Montarroso, Andrés, Cámara, Eugenio Martínez, Gómez-Romero, Juan
The automatic detection of disinformation presents a significant challenge in the field of natural language processing. This task addresses a multifaceted societal and communication issue, which needs approaches that extend beyond the identification of general linguistic patterns through data-driven algorithms. In this research work, we hypothesise that text classification methods are not able to capture the nuances of disinformation and they often ground their decision in superfluous features. Hence, we apply a post-hoc explainability method (SHAP, SHapley Additive exPlanations) to identify spurious elements with high impact on the classification models. Our findings show that non-informative elements (e.g., URLs and emoticons) should be removed and named entities (e.g., Rwanda) should be pseudo-anonymized before training to avoid models' bias and increase their generalization capabilities. We evaluate this methodology with internal dataset and external dataset before and after applying extended data preprocessing and named entity replacement. The results show that our proposal enhances on average the performance of a disinformation classification method with external test data in 65.78% without a significant decrease of the internal test performance.
Enhancing Financial Fraud Detection with Human-in-the-Loop Feedback and Feedback Propagation
Human-in-the-loop (HITL) feedback mechanisms can significantly enhance machine learning models, particularly in financial fraud detection, where fraud patterns change rapidly, and fraudulent nodes are sparse. Even small amounts of feedback from Subject Matter Experts (SMEs) can notably boost model performance. This paper examines the impact of HITL feedback on both traditional and advanced techniques using proprietary and publicly available datasets. Our results show that HITL feedback improves model accuracy, with graph-based techniques benefiting the most. We also introduce a novel feedback propagation method that extends feedback across the dataset, further enhancing detection accuracy. By leveraging human expertise, this approach addresses challenges related to evolving fraud patterns, data sparsity, and model interpretability, ultimately improving model robustness and streamlining the annotation process.
Cross-lingual Retrieval for Iterative Self-Supervised Training
Recent studies have demonstrated the cross-lingual alignment ability of multilingual pretrained language models. In this work, we found that the cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs. We utilized these findings to develop a new approach --- cross-lingual retrieval for iterative self-supervised training (CRISS), where mining and training processes are applied iteratively, improving cross-lingual alignment and translation ability at the same time. Using this method, we achieved state-of-the-art unsupervised machine translation results on 9 language directions with an average improvement of 2.4 BLEU, and on the Tatoeba sentence retrieval task in the XTREME benchmark on 16 languages with an average improvement of 21.5% in absolute accuracy. Furthermore, CRISS also brings an additional 1.8 BLEU improvement on average compared to mBART, when finetuned on supervised machine translation downstream tasks.
Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding
Xiao, Bin, Gui, Lujun, Su, Lei, Chen, Weipeng
Large Language Models (LLMs) frequently suffer from inefficiencies, largely attributable to the discord between the requirements of auto-regressive decoding and the architecture of contemporary GPUs. Recently, regressive lightweight speculative decoding has garnered attention for its notable efficiency improvements in text generation tasks. This approach utilizes a lightweight regressive draft model, like a Recurrent Neural Network (RNN) or a single transformer decoder layer, leveraging sequential information to iteratively predict potential tokens. Specifically, RNN draft models are computationally economical but tend to deliver lower accuracy, while attention decoder layer models exhibit the opposite traits. This paper presents Clover-2, an advanced iteration of Clover, an RNN-based draft model designed to achieve comparable accuracy to that of attention decoder layer models while maintaining minimal computational overhead. Clover-2 enhances the model architecture and incorporates knowledge distillation to increase Clover's accuracy and improve overall efficiency. We conducted experiments using the open-source Vicuna 7B and LLaMA3-Instruct 8B models. The results demonstrate that Clover-2 surpasses existing methods across various model architectures, showcasing its efficacy and robustness.
Augmentations vs Algorithms: What Works in Self-Supervised Learning
Morningstar, Warren, Bijamov, Alex, Duvarney, Chris, Friedman, Luke, Kalibhat, Neha, Liu, Luyang, Mansfield, Philip, Rojas-Gomez, Renan, Singhal, Karan, Green, Bradley, Prakash, Sushant
We study the relative effects of data augmentations, pretraining algorithms, and model architectures in Self-Supervised Learning (SSL). While the recent literature in this space leaves the impression that the pretraining algorithm is of critical importance to performance, understanding its effect is complicated by the difficulty in making objective and direct comparisons between methods. We propose a new framework which unifies many seemingly disparate SSL methods into a single shared template. Using this framework, we identify aspects in which methods differ and observe that in addition to changing the pretraining algorithm, many works also use new data augmentations or more powerful model architectures. We compare several popular SSL methods using our framework and find that many algorithmic additions, such as prediction networks or new losses, have a minor impact on downstream task performance (often less than $1\%$), while enhanced augmentation techniques offer more significant performance improvements ($2-4\%$). Our findings challenge the premise that SSL is being driven primarily by algorithmic improvements, and suggest instead a bitter lesson for SSL: that augmentation diversity and data / model scale are more critical contributors to recent advances in self-supervised learning.
Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems
Lilienthal, Derek, Mello, Paul, Eirinaki, Magdalini, Tiomkin, Stas
While recommender systems have become an integral component of the Web experience, their heavy reliance on user data raises privacy and security concerns. Substituting user data with synthetic data can address these concerns, but accurately replicating these real-world datasets has been a notoriously challenging problem. Recent advancements in generative AI have demonstrated the impressive capabilities of diffusion models in generating realistic data across various domains. In this work we introduce a Score-based Diffusion Recommendation Module (SDRM), which captures the intricate patterns of real-world datasets required for training highly accurate recommender systems. SDRM allows for the generation of synthetic data that can replace existing datasets to preserve user privacy, or augment existing datasets to address excessive data sparsity. Our method outperforms competing baselines such as generative adversarial networks, variational autoencoders, and recently proposed diffusion models in synthesizing various datasets to replace or augment the original data by an average improvement of 4.30% in Recall@$k$ and 4.65% in NDCG@$k$.
Predicting the First Response Latency of Maintainers and Contributors in Pull Requests
Khatoonabadi, SayedHassan, Abdellatif, Ahmad, Costa, Diego Elias, Shihab, Emad
The success of a Pull Request (PR) depends on the responsiveness of the maintainers and the contributor during the review process. Being aware of the expected waiting times can lead to better interactions and managed expectations for both the maintainers and the contributor. In this paper, we propose a machine-learning approach to predict the first response latency of the maintainers following the submission of a PR, and the first response latency of the contributor after receiving the first response from the maintainers. We curate a dataset of 20 large and popular open-source projects on GitHub and extract 21 features to characterize projects, contributors, PRs, and review processes. Using these features, we then evaluate seven types of classifiers to identify the best-performing models. We also perform permutation feature importance and SHAP analyses to understand the importance and impact of different features on the predicted response latencies. Our best-performing models achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors compared to a no-skilled classifier across the projects. Our findings indicate that PRs submitted earlier in the week, containing an average or slightly above-average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average or slightly above-average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses.
Toward smart production: Machine intelligence in business operations
As the superintendent of Vistra Corp's Luminant Martin Lake Power Plant, Wayne Brown is an expert in power generation. Vistra is the largest competitive power producer in the United States, operating power plants in 12 states and producing more than 39 gigawatts of electricity--enough to power nearly 20 million homes. The company has been on a journey to drive operational excellence across its generation portfolio. Launched in 2016, its Operational Performance Initiative has driven a step-change improvement in the efficiency of its assets, generating hundreds of millions in incremental EBITDA along the way. This article is a collaborative effort by Duane S. Boning, Vijay D'Silva, Pete Kimball, Bruce Lawler, Retsef Levi, and Ingrid Millan, representing views from McKinsey's Operations Practice and the Massachusetts Institute of Technology's Machine Intelligence for Manufacturing and Operations program. To maintain and improve its position, Vistra is continually looking for the tools, technologies, and approaches that will help it achieve the next level of performance. Most recently, the company has turned to digital and analytics, including machine intelligence (MI). It measures how much electricity is generated for each ton of fuel consumed by the plant.